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ABSTRACT 

A low complexity, open-loop, discrete-time, del ay- 
mu Itiply -average (DMA) technique for estimating the 
frequency offset for digitally modulated MPSK signals is 
investigated. A nonlinearity is used to remove the MPSK 
modulation and generate the carrier component to be 
extracted. Theoretical and simulated performance results 
are presented and compared to the Cramer-Rao lower 
bound (CRLB) for the variance of the frequency estimation 
error. For all signal-to-noise ratios (SNRs) above 
threshold, it is shown that the CRLB can essentially be 
achieved with linear complexity. 

INTRODUCTION 

Most conventional burst transmission systems with 
frequency uncertainty provide a preamble of unmodulated 
carrier and/or a carrier modulated with a known symbol 
pattern, for initial frequency estimation and 
synchronization purposes. There are also many other 
applications where it is desirable to estimate the frequency 
error from a modulated signal with unknown data. In either 
case, it is desirable to have a fast, efficient, and accurate 
frequency estimation algorithm, both for initial acquisition 
and tracking purposes. 

In this paper, a low complexity, open-loop, discrete- 
time, delay-multiply-average (DMA) approach to 
estimating the frequency offset for digitally modulated 
signals is investigated. A/-ary phase shift keyed (MPSK) 
signaling formats are considered. An A/-power-type 
nonlinearity can be used to generate a carrier component 
when the data symbols are unknown. The special case of 
pure carrier and/or known symbols is included by setting 
M= 1. Performance is theoretically approximated and 
compared to the Cramer-Rao lower bound (CRLB) for the 
variance of the frequency estimation error. Simulated 
performance is also presented and compared to the 
theoretical approximations and bounds. It is shown that, 
when optimum delays are employed, performance is within 
about 0.5 dB of the CRLB for all signal-to-noise ratios 
(SNRs) above threshold. A simple extension to the DMA 
algorithm, which approximates true maximum-likelihood 
(ML) estimation, is also examined. With the ML 


extension, the CRLB is essentially achieved for all SNRs 
above threshold. 

Previously known open-loop techniques which provide 
performance close to the CRLB typically involve some 
form of fast Fourier transform (FFT) processing [1]. The 
complexity of FFT based algorithms is order KL \og 2 (KL) 
where K is the observation time in samples and L is the 
zero-stuffing factor required to obtain the desired frequency 
resolution using an FFT of size KL. Small L values of 2 or 
4 are usually recommended when the FFT is used only for a 
coarse search [1], To approach the CRLB, additional 
processing is required to perform a fine search for the peak 
of the likelihood function. The complexity of the DMA 
based algorithm presented in [2] is order KB where B is the 
number of DMA branches employed. The number of 
branches required depends on the desired threshold SNR, 
but can typically be made fewer than log 2 (AT) for many 

applications. For example, 3 branches were found to be 
sufficient for the MS AT application described in [3], with 
A"=100. This paper presents a modified version of the basic 
DMA algorithm described in [2] and a simple ML 
extension. In addition to providing improved performance, 
the complexities of the new DMA algorithm and its ML 
extension are both of order K. 

FREQUENCY ESTIMATION 

Single Branch DMA Approach 

Figure 1 shows an open-loop frequency phasor 
estimator, based on the DMA approach. The sampled 
(discrete) complex baseband received signal, {r^}, is 
modeled as 

r k = Aa k exp(jcok) + w k 

= Aa k W k +w k ^ 

where the complex phasor, W, is defined as 

W = cxp(joy) ^ ( 2 ) 

A is the signal’s complex amplitude, a k represents the 

MPSK modulation data symbols, given by 
a k = exp(j’2/rwi/Af), m e (0,...,M-1) 

co is the frequency offset measured in radians per sample or 
symbol period, T , and w k is additive noise. 
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Fig. 1 : Single branch DMA frequency estimator 


The sample SNR at the receiver input is defined as 

* n 

\ W k\ 


(4) 


where E[.] denotes the expected value operator. For 
mathematical convenience, and without any loss in 
generality, it is assumed that 1AI=1, so that P r = 1 and 

p w =vr 

The received signal is first passed through a 
generalized M-power-type nonlinearity to remove the 
MPSK modulation. The nonlinearity is generalized in the 
sense that the phase is multiplied by M but the amplitude 
can be raised to a different power, namely M a . From (1), 

the signal at the output of the nonlinearity is given by 
M I \M a —M A M a urkAf . 

Sk =r k\ r k\ ^ +n k ( 5 ) 

This nonlinearity is equivalent to that introduced in [4] for 
carrier phase estimation. The noise term, n ^ is quite 
complicated in general. Although simulation results are 
presented for different values of M a and M, the theoretical 

approximations are restricted to the case of M a =M. With 
this restriction, n*is given by 

n k = f 

«=t (6) 

The objective is to obtain an estimate of W, since this 
phasor contains the phase rotation over a single sample 
period due to the frequency offset, co. Multiplying the 
received signal samples, {r^}, by the sequence { W * } would 
remove the frequency offset. An estimate of 

Z = W Md (7) 

is obtained first, and is given by 

K 

Z X s k s k-d 

k=d+] ( 8 ) 

where K is the number of samples used in the measurement, 
and d is the delay in sample periods. The estimate of W is 
then given by 

W = [Z] UMd (9) 

In the absence of noise and possible phase ambiguities 
associated with multiple complex roots, it is clear that 

Z = Z and W = W. 


Multiple Branch DMA Approach 

There is a fundamental phase ambiguity problem 
associated with all frequency estimators of this type. 
Without a previous estimate for guidance, the maximum 
resolvable frequency offset is less than \f(2TMd) Hz. The 
larger the delay, the more potential phase ambiguities. The 
phase ambiguity problem results from not knowing which 
of the Md complex roots to choose. In most cases the 
ambiguity can be resolved by employing a ball-park 
estimate to guide the selection of the appropriate complex 
root. Given a previous estimate, obtained using delay d hA , 

a new estimate, using delay d b > d b _\ , can be obtained as 


follows 


W b =W b _ , 


W m *> 
w b - 1 


1/A td h 


( 10 ) 

If the delays are selected such that 

d b=Pb d b - 1 ,b = 2...B (11) 

where p b is an integer greater than or equal to 2, then (10) 


is equivalent to 

W h =W b - 1 


\/MJ h 




( 12 ) 


If the root operation in (10) or (12) always takes the 
principle root and the phase difference between the current 
and previous estimate is within K/Md b , which is the 
maximum resolvable phase difference with delay d b , then 


the overall result corresponds to the correct root and the 
phase ambiguity is resolved. If the previous phase error is 
too large to resolve the phase ambiguity, then the incorrect 
root which is closest to the previous estimate will be 
selected. Equations (10) and (12) are clearly equivalent to 
(9) if the appropriate root is selected. 


The new DMA based algorithm is depicted in Figure 2. 
The approach is similar to that given in [2], in that multiple 
DMA branches are used to resolve potential phase 
ambiguities as the branch delays increase. The method 
shown for resolving phase ambiguities is that of (12). This 
method can be used because the delays are specifically 
chosen to be increasing powers of 2, resulting in p b = 2 for 

each branch. The major difference between the DMA 
approach of Figure 2 and the DMA approach of [2] is the 
rotate- add-decimate (RAD) operation, which is performed 
repeatedly on the signal, s k , at the output of the 

nonlinearity. 

To simplify the description of the technique, the 
observation time in samples is restricted to be 

K = 3x2 fl ~ 2 , B> 2 ( 13 ) 
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b= I..B-2 



Fig. 2: Bank of B frequency estimators with rotate- add- 
deciniate (RAD) processing. 


where it is assumed that at least the bottom 2 branches 
shown in Figure 2 are employed. More general values of K 
can be accommodated, but the values of K given in ( 1 3) are 
the most convenient. The desired delays in original 
samples for the B branches are 


d h = 2 


b—\ 


b = I...B 


(14) 


The RAD operation always decimates by 2. Thus the 
corresponding delays in decimated samples for the B 
branches are given by 

D h = 1, 6 = 1.. .£-1 

= 2, b = B 


(15) 


Only 3 samples are processed in the final 2 branches and 
the RAD operation is not used between the last 2 branches. 
This is why a delay of 2 samples is used in the final branch. 
In [2], it is shown that the optimum delay for the final 
branch is 2/3 the number of samples. 


The idea behind the RAD operation is to pseudo- 
coherently combine sample pairs to improve the sample 
SNR by approximately 3 dB, while simultaneously 
lowering the complexity by reducing the number of 
samples to be processed later. The RAD operation 
performed after the 6-th branch is given by 

s kM * ” s 2k-l,b +Zb s 2k,b f * = ] - K h + 1 1 ^ 

where 


h = Zb/]' 


z 




is the unit amplitude rotation factor applied after the 
branch, and 


(17) 


6-th 


K h = 3x2 B ~ h -\ b=\...B-\ 

= 3, b = B 


(18) 


is the number of decimated samples used to estimate Z in 
the 6-th branch. The RAD operation performed after the b- 
th branch requires only K } J2 multiplies and adds. The RAD 

operation removes the estimated frequency error from the 
input signal in a pairwise fashion, enabling approximate 
coherent combining. The estimated frequency error is not 
completely removed, as this would require about 2 K b 


multiplies. The RAD operation also has an interesting 
frequency domain interpretation. It is equivalent to 
performing down-conversion, low-pass filtering with a 
100% roll-off root-raised-cosine (RRC) filter, decimating- 
by-2, followed by upconversion or reintroduction of the 
frequency error. After decimation, the actual frequency 
error may lie within one of the aliased spectra. The 
processing used to select the correct root is equivalently 
selecting the appropriate aliased spectrum. 


The majority of the processing is that required to 
compute the Z estimates for each branch. The total number 
of complex multiplies and adds is 
#mult = 3K-B-4 

#adds = 3K - 2B -4 ( 19 ) 

which indicates a complexity of only order K . 


Maximum Likelihood Extension 

Consider the pure tone case with A/ a =M=l so that s k =r k 

as defined in (1) and (5). The additive noise is assumed to 
be white and Gaussian with n k =w k . The maximum 

likelihood (ML) frequency estimator finds the frequency 


co ML - u which maximizes the function 
f(u) = \S(uf =S(u)S*(u) 


where 


( 20 ) 


( 21 ) 


( 22 ) 


U~ k 

k=\ 

is the Fourier transform of {^.} with U defined as 
U = exp(ju) 

Newton's method can be used to find the maximum of f{u) 
by finding the zero-crossing of the first derivative of/(w), 
provided the initial guess is close to the peak of the main 
lobe oij\u). A good initial guess is given by the frequency 

estimate 0) B = phase(W B ) from the final branch of the 


DMA based estimator of Figure 2. The simulation results 
show that there is little to be gained by using more than a 
single step of Newton's method. Thus, an approximate ML 
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extension to the DMA based frequency estimator of Figure 
2 is given by 

« _A /'(»<> 


(23) 


(24) 


, 

f (®b) 

The first and second derivatives of/(w) are given by 
/»= 2Re[S'(ii)S*(«)] 

/"(«)= 2|S'(m)| 2 + 2Re[S w (u) S*(w)] 
where the n-th derivative of S(u) is given by 

5 (n) («) = -^ r 5( M ) = X(-#)" * 

r-=i (25) 

Combining the above results to further simplify (23) gives 

Im[s 0 .?;] 


where the 3 sums, S 0 , S) and S 2 are defined as 


(26) 




(27) 


(28) 


n = 0,1,2 
*=i 

with the definition that 

w B = Wfi/ |W b| = exp {j(b B ) 

With a few further minor manipulations to the sums in (27), 
it can be shown that the total number of multiplies and adds 
required to implement the ML extension is upper bounded 
by 

#wu// =# adds = 2.5 K (29) 

Thus, the complexity of the ML extension is also order K . 
The ML extension can also be applied to one of the sets of 
K b decimated samples. With this slight modification, the 

complexity of the ML extension can be reduced even 
further to that of a constant. It is shown in the next section 
that the performance penalty with this modification is very 
small. 

THEORETICAL ANALYSIS 

For the theoretical results which follow it is assumed 
that the noise samples, { m^}, are Gaussian and 
uncorrelated, that M=M in the nonlinearity, and that all 

potential phase ambiguities are correctly resolved. An 
approximation for the variance of the frequency estimator 
shown in Figure 1, measured in (radians/7) 2 , was derived in 
[2]. The approximation is most accurate for high SNRs 
and/or long observation times, when the true angular 

variance of W is small. The result is 


V(K,d,N) = W + -- 2 -2 V“dlT) 2 

V ' ( K-d) 2 d 2 M 2 2 (K-d)d l M 2 

where 

M 2 

«=!(") »"r-’ 

m = 1 


(30) 


(31) 


is the power of the noise terms defined in {t). 


The frequency estimate variance for each of the 
branches shown in Figure 2 can be approximated by 

V£ = (K b /K) 2 V(K h ,D b ,N b ), b = \...B (32) 

where K y D h , and K h are as defined in (13), (15), and (18), 

respectively. The scale factor in(32) is required to convert 
from decimated sample periods back to original sample 
periods, T, to preserve the units of (radians/7) 2 . The N h 

term represents the effective noise power at the input to the 
b-th branch. For SNRs above threshold, the frequency 
estimation error remaining after each branch is typically 
well within the 3 dB bandwidth of the 100% roll-off RRC 
filter used in the following RAD operation. Since this filter 
cuts the noise power in half each time it is applied, a good 
approximation for A^, for SNRs above threshold, is 

N h =(K b /K)N , 6 = 1...* (33) 

where K and K b are again given by (13) and (18). For the 
final branch in Figure 2, the approximation becomes 
V^ = {K b /K) 2 V(K b ,D b ,N b ) 


= OIK) 2 VO, 2, 3 N/K) 
27 N [ 3Nl 
" 4 K 2 M 2 2tfJ 


{rad IT) 2 


For high SNRs (or for all SNRs with M= 1), N can be 
approximated by the first term in (31), which gives 
N(y »l)=M 2 y _1 


(34) 

(35) 


With this approximation 

Vfl(y»l) = -^- (rad /T ) 2 

4K r (36) 

Note that the variance at high SNRs is not a function of M. 
For low SNRs the extra noise terms become more 
significant and performance does depend on M. However, 
for the new DMA frequency estimator with RAD, the last 
noise term in (34) is reduced by an additional factor of K 
which is not present for the frequency estimator presented 
in [2]. At low SNRs, where large values of K are typically 
required, this improvement can be very significant. 


The Cramer-Rao lower bound (CRLB) on the variance 
of any discrete -time frequency estimator is given by [2, 5] 


CRLB(K, y) J {rad IT) 2 

K{K 2 -\)y 


(37) 


Comparing this with (36), the degradation in dB relative to 
the CRLB for the frequency estimator of Figure 2, at high 
SNRs, is given by 

Deg(y»l) = 101og(^^) dB ^ 

For large observation times, AT»1, the degradation from 
the CRLB is approximately 101og(9/8)=0.5 dB. Note that 
there is no degradation from the CRLB with K= 3. The 
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simulation results show that the performance ol the new 
DMA frequency estimator with RAD remains very close to 
the CRLB for all SNRs above threshold. 


The CRLB, as given in (37), applies to the original K 
received samples, ( r k } y and is valid for the MPSK signal 
model used with any value for M. For the pure carrier case, 
without a nonlinearity (i.e. Af u =A/=l ), a CRLB can also be 
derived for each set of K b decimated samples at the input to 
the 6-th branch of Figure 2. The result is 
CRLB b =(K b /K) 2 CRLB{K b> y h \ 6 = 1 .. .B (39) 
where K and K b are defined in ( 1 3) and (18). The scale 

factor in (39) is required to convert from decimated sample 
periods to original sample periods, T y to preserve the units 
of (radians/7') 2 . The y b term represents the sample SNR at 


the input to the b-th branch. Using the same arguments as 
for (33), a good approximation for y b , for all SNRs above 


threshold, is 

Yh =(K/K b )Y, 6 = 1.. .B 
where K m&K b are again given by (13) and (18). 
Simplifying (39) further gives 


(40) 


CRLB b = 


K(K 2 -(K/K b ) 2 )y’ 


6 = 1 .. B 


(41) 


The degradation in the CRLB, measured in dB for the 6-th 
branch, where K b decimated samples are used instead of the 
original K samples, is given by 
CRLB h 


Deg b = 10 log 


= 10 log 


CRLB 


Kl{\-K~ 2 ) 


Kb 


1 


dB , 


6 = 1...R 


For A>>1, the degradation is approximately given by 


Deg b (K » 1) * 10 log 


Kl 

Kb — 1 


dB , b = l...B 


(42) 


(43) 


Representative examples of the degradations in the CRLB 
for K b =3, 6 and 12 are 0.51, 0.12 and 0.03 dB, respectively. 
The degradation in the CRLB is clearly negligible for 


K b >12. Note that the ML extension described earlier can 
be applied to any set of K b decimated samples (e.g. K b =\2) y 
and not just to the initial set of K samples. Thus, for large 
values of K , the complexity of the ML extension can be 
reduced to a fixed constant, independent of K , with 
negligible degradation in performance. Thus, the 
complexity of the complete frequency estimator with the 
ML extension remains approximately 3 K. 


EXAMPLE PERFORMANCE RESULTS 

The simulated performance results are presented in 
terms of measured root-mean-squared (RMS) frequency 
error in (cycles/7) versus sample SNR, y, in dB. An 
observation time of K=48 samples was used, and 5000 
independent trials were simulated for each SNR. Figure 3 
shows the results for the case of pure carrier with no 
nonlinearity ( M a =M =\ ). Three sets of simulation results 

are shown. The first set, with d=l 9 is for the single branch 
estimator of Figure 1 or the first branch in Figure 2. The 
second set, with d B = 32, is for the final branch of the new 

DMA estimator of Figure 2, The third set is for the ML 
extension applied to the original Af=48 samples. The 
performance is essentially the same for a decimated set of 
12 or more samples. Also shown, for comparison, are the 
corresponding theoretical approximations and the CRLB. 

It is observed that the theoretical approximations are quite 
accurate for all SNRs above threshold With the ML 
extension, the CRLB is essentially achieved for all SNRs 
above threshold. The threshold SNR is observed to be 
about 0 dB for this case. 



Sample SNR (dB) 


Figure 3: RMS frequency error versus sample SNR, y, for 
pure carrier (M a =M= 1, dmax=dg). 

Figures 4 and 5 show simulation results for BPSK and 
QPSK signaling, respectively. For the simulated BPSK 
results in Figure 4, M=2 and M a =l . For the simulated 

QPSK results in Figure 5, M=A and M a - 1 . Not shown are 
the simulation results with M a ~M y but they closely match 
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the theoretical approximations for all SNRs above 
threshold. The simulation results with A/^l are clearly 

better than the theoretical approximations with M a =M. 

Note that the simulated performance of the DMA estimator 
with RAD remains within about 0.5 dB of the CRLB for all 
SNRs above threshold, and that the CRLB is essentially 
achieved with the ML extension. As expected, the 
threshold SNRs are much higher with M>\. Longer 
observation times are required to provide lower thresholds. 



Figure 4: RMS frequency error versus sample SNR, y, for 
BPSK signaling (Af=2, M a =l, dmax=J^). 

CONCLUSIONS 

A low-complexity, open-loop, discrete-time, delay- 
multiply-average (DMA) approach to estimating the 
frequency offsets for MPSK modulated signals was 
investigated A simple maximum likelihood (ML) 
extension was also considered Theoretical and simulated 
performance results were presented and compared to the 
Cramer-Rao lower bound (CRLB) for the variance of the 
frequency estimation error. It was shown that the 
frequency estimate variance can be improved by orders of 
magnitude over that obtained with a delay of d~l. Without 
the ML extension, performance is typically within about 
0.5 dB of the CRLB, for all SNRs above threshold. With 
the ML extension, the CRLB is essentially achieved The 
complexity of the new DMA algorithm, with or without the 
ML extension, is approximately 3 K y where K is the 
observation time in samples. 



Figure 5: RMS frequency error versus sample SNR, y, for 
QPSK signaling (Af=4, M^=l, dma x=d B ). 
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